Goto

Collaborating Authors

 threshold model



Influence Maximization with $\varepsilon$-Almost Submodular Threshold Functions

Neural Information Processing Systems

Influence maximization is the problem of selecting k nodes in a social network to maximize their influence spread. The problem has been extensively studied but most works focus on the submodular influence diffusion models. In this paper, motivated by empirical evidences, we explore influence maximization in the nonsubmodular regime. In particular, we study the general threshold model in which a fraction of nodes have non-submodular threshold functions, but their threshold functions are closely upper-and lower-bounded by some submodular functions (we call them ฮต-almost submodular).


Complex contagions can outperform simple contagions for network reconstruction with dense networks or saturated dynamics

arXiv.org Machine Learning

Network scientists often use complex dynamic processes to describe network contagions, but tools for fitting contagion models typically assume simple dynamics. Here, we address this gap by developing a nonparametric method to reconstruct a network and dynamics from a series of node states, using a model that breaks the dichotomy between simple pairwise and complex neighborhood-based contagions. We then show that a network is more easily reconstructed when observed through the lens of complex contagions if it is dense or the dynamic saturates, and that simple contagions are better otherwise.


Efficient human-in-loop deep learning model training with iterative refinement and statistical result validation

arXiv.org Artificial Intelligence

Annotation and labeling of images are some of the biggest challenges in applying deep learning to medical data. Current processes are time and cost-intensive and, therefore, a limiting factor for the wide adoption of the technology. Additionally validating that measured performance improvements are significant is important to select the best model. In this paper, we demonstrate a method for creating segmentations, a necessary part of a data cleaning for ultrasound imaging machine learning pipelines. We propose a four-step method to leverage automatically generated training data and fast human visual checks to improve model accuracy while keeping the time/effort and cost low. We also showcase running experiments multiple times to allow the usage of statistical analysis. Poor quality automated ground truth data and quick visual inspections efficiently train an initial base model, which is refined using a small set of more expensive human-generated ground truth data. The method is demonstrated on a cardiac ultrasound segmentation task, removing background data, including static PHI. Significance is shown by running the experiments multiple times and using the student's t-test on the performance distributions. The initial segmentation accuracy of a simple thresholding algorithm of 92% was improved to 98%. The performance of models trained on complicated algorithms can be matched or beaten by pre-training with the poorer performing algorithms and a small quantity of high-quality data. The introduction of statistic significance analysis for deep learning models helps to validate the performance improvements measured. The method offers a cost-effective and fast approach to achieving high-accuracy models while minimizing the cost and effort of acquiring high-quality training data.


Adaptive Greedy versus Non-adaptive Greedy for Influence Maximization

Journal of Artificial Intelligence Research

We consider the adaptive influence maximization problem: given a network and a budget k, iteratively select k seeds in the network to maximize the expected number of adopters. In the full-adoption feedback model, after selecting each seed, the seed-picker observes all the resulting adoptions. In the myopic feedback model, the seed-picker only observes whether each neighbor of the chosen seed adopts. Motivated by the extreme success of greedy-based algorithms/heuristics for influence maximization, we propose the concept of greedy adaptivity gap, which compares the performance of the adaptive greedy algorithm to its non-adaptive counterpart. Our first result shows that, for submodular influence maximization, the adaptive greedy algorithm can perform up to a (1 โˆ’ 1/e)-fraction worse than the non-adaptive greedy algorithm, and that this ratio is tight. More specifically, on one side we provide examples where the performance of the adaptive greedy algorithm is only a (1โˆ’1/e) fraction of the performance of the non-adaptive greedy algorithm in four settings: for both feedback models and both the independent cascade model and the linear threshold model. On the other side, we prove that in any submodular cascade, the adaptive greedy algorithm always outputs a (1 โˆ’ 1/e)-approximation to the expected number of adoptions in the optimal non-adaptive seed choice. Our second result shows that, for the general submodular diffusion model with full-adoption feedback, the adaptive greedy algorithm can outperform the non-adaptive greedy algorithm by an unbounded factor. Finally, we propose a risk-free variant of the adaptive greedy algorithm that always performs no worse than the non-adaptive greedy algorithm.


A Lower Bound on DNNF Encodings of Pseudo-Boolean Constraints

arXiv.org Artificial Intelligence

Two major considerations when encoding pseudo-Boolean (PB) constraints into SAT are the size of the encoding and its propagation strength, that is, the guarantee that it has a good behaviour under unit propagation. Several encodings with propagation strength guarantees rely upon prior compilation of the constraints into DNNF (decomposable negation normal form), BDD (binary decision diagram), or some other sub-variants. However it has been shown that there exist PB-constraints whose ordered BDD (OBDD) representations, and thus the inferred CNF encodings, all have exponential size. Since DNNFs are more succinct than OBDDs, preferring encodings via DNNF to avoid size explosion seems a legitimate choice. Yet in this paper, we prove the existence of PB-constraints whose DNNFs all require exponential size.


An Influence Network Model to Study Discrepancies in Expressed and Private Opinions

arXiv.org Artificial Intelligence

In many social situations, a discrepancy arises between an individual's private and expressed opinions on a given topic. Motivated by Solomon Asch's seminal experiments on social conformity and other related socio-psychological works, we propose a novel opinion dynamics model to study how such a discrepancy can arise in general social networks of interpersonal influence. Each individual in the network has both a private and an expressed opinion: an individual's private opinion evolves under social influence from the expressed opinions of the individual's neighbours, while the individual determines his or her expressed opinion under a pressure to conform to the average expressed opinion of his or her neighbours, termed the local public opinion. General conditions on the network that guarantee exponentially fast convergence of the opinions to a limit are obtained. Further analysis of the limit yields several semi-quantitative conclusions, which have insightful social interpretations, including the establishing of conditions that ensure every individual in the network has such a discrepancy. Last, we show the generality and validity of the model by using it to explain and predict the results of Solomon Asch's seminal experiments.


Incremental Sparse Bayesian Ordinal Regression

arXiv.org Machine Learning

Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning. An important class of approaches to OR models the problem as a linear combination of basis functions that map features to a high dimensional non-linear space. However, most of the basis function-based algorithms are time consuming. We propose an incremental sparse Bayesian approach to OR tasks and introduce an algorithm to sequentially learn the relevant basis functions in the ordinal scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression (ISBOR), automatically optimizes the hyper-parameters via the type-II maximum likelihood method. By exploiting fast marginal likelihood optimization, ISBOR can avoid big matrix inverses, which is the main bottleneck in applying basis function-based algorithms to OR tasks on large-scale datasets. We show that ISBOR can make accurate predictions with parsimonious basis functions while offering automatic estimates of the prediction uncertainty. Extensive experiments on synthetic and real word datasets demonstrate the efficiency and effectiveness of ISBOR compared to other basis function-based OR approaches.


Learning with Abandonment

arXiv.org Machine Learning

Consider a platform that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if she is dissatisfied with the actions of the platform. For example, a platform is interested in personalizing the number of newsletters it sends, but faces the risk that the user unsubscribes forever. We propose a general thresholded learning model for scenarios like this, and discuss the structure of optimal policies. We describe salient features of optimal personalization algorithms and how feedback the platform receives impacts the results. Furthermore, we investigate how the platform can efficiently learn the heterogeneity across users by interacting with a population and provide performance guarantees.


Influence Maximization with $\varepsilon$-Almost Submodular Threshold Functions

Neural Information Processing Systems

Influence maximization is the problem of selecting $k$ nodes in a social network to maximize their influence spread. The problem has been extensively studied but most works focus on the submodular influence diffusion models. In this paper, motivated by empirical evidences, we explore influence maximization in the non-submodular regime. In particular, we study the general threshold model in which a fraction of nodes have non-submodular threshold functions, but their threshold functions are closely upper- and lower-bounded by some submodular functions (we call them $\varepsilon$-almost submodular). We first show a strong hardness result: there is no $1/n^{\gamma/c}$ approximation for influence maximization (unless P = NP) for all networks with up to $n^{\gamma}$ $\varepsilon$-almost submodular nodes, where $\gamma$ is in (0,1) and $c$ is a parameter depending on $\varepsilon$. This indicates that influence maximization is still hard to approximate even though threshold functions are close to submodular. We then provide $(1-\varepsilon)^{\ell}(1-1/e)$ approximation algorithms when the number of $\varepsilon$-almost submodular nodes is $\ell$. Finally, we conduct experiments on a number of real-world datasets, and the results demonstrate that our approximation algorithms outperform other baseline algorithms.